In this report a 2D resource evaluation of a Co-Ni deposit is conducted. In this process, also different interpolation and cosimulation methods are compared with regards to error. The assigned files contain several layers of data from different parts of the same nickel laterite deposit (Murrin Murrin, Western Australia). Four lithologies are reported:
The metallogenetic model considers Ni (and to a lesser degree Co) is washed from the FZ and concentrated in the SA and SM zones, while Mg is utterly removed from the system. So, there are systematic changes in the composition of the ore depending on the zone.
This work consists on analysing this data set, with an exploratory data analysis, a spatial exploratory analysis, variography, indicator and cokriging as well as cosimulation, and crossvalidation to evaluate the resource distribution in one slice of the deposit. To optain an overview of the effect of choosing different interpolation methods. In total six methods are compared. Three Kriging methods and three gaussian simulation methods. The methods are as follows:
| Kriging | Simulation | ||
|---|---|---|---|
| VK1 | Cokriging w. log-transformed values | VS1 | Cosimulation w. log-transformed values |
| VK2 | Mixed-model kogriging w. log-transformed values and lithology |
VS2 | Mixed model cosimulation w. log-transformed values and lithology |
| VK3 | Logratio-composition based cokriging | VS3 | Logratio-composition based cosimulation |
| IK | Indicator kriging for Lithology with anisotropic model |
As such, the experimental plan follows somewhat the scheme of comparing
While this testing plan does not follow a Design of Experiments appoach directly. It could somewhat viewed as a two dimensional testing plan with two levels for a and three for b.
The variants then are compared to each other and a final model is chosen and applied vor interpretation of the results. The selection of the final model is based on the mean absolute error and crossvalidation scatterplots.
For the final model tonnage-grade curves are computed to estimate the resource potential of the area. Since the final model uses simulation methods. A manyfold of grade-tonnage curves is produced to estimate the variation and uncertainty of the resource estimation.
The dataset is a partial of a study published as: [1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9
Notice: The nature of this work is largely driven by graphical assesments and graphing To maintain a certain brevity, some subsections of this work are structured in parralel. This means, that the user can activate the subsection she or he wishes to see. This also allows to show details not neccesary for the core understanding of the work. Code sections used to compile the calculations can be expanded by clicking on Code.
For the assembly of this work, the following packages are used:
#install.packages("funModeling")
#install.packages(gridExtra)
#install.packages(funModeling)
#install.packages("corrplot")
#install.packages("kableExtra")
#install.packages("vcd")
#install.packages("cvms") # for confusion metrics and matrices
#install.packages("plotly") # for interactive plots, is not loaded into namespace since it overwrites some dplyr functions
#install.packages("ggnewscale")
#install.packages("rasterly")
library("rasterly") # fast plotting of larger spatial datasets with plotly
library("ggnewscale") # allows combination of cont. and factorial data in one ggmap
library("sp") # contains spatial data classses
library("gstat") # classical geostatistics
library("RColorBrewer") # color palettes
library("magrittr") # pipe %>%
library("dplyr") # data manipulation verbs
library("gmGeostats") # gstat re-interfaced / for swath plots
library("ggplot2") # nicer plots
library("data.table") # for fast data frame manipulation
library("knitr") # for nicer report formatting
library("kableExtra") # for table formatting
library("corrplot") # for corralation visualisation
library("funModeling") # helper functions for exploratory data analysis
library("compositions") # compositional modelling
library("gridExtra") # for arranging multiple ggplots in a grid similar to par(mfrow)
library("vcd") # slightly improved optics for mosaic plot
library("readr") # more comfortable read csv fun
The provided dataset consists of 12 columns and 735 observations. By standard, the columns X1-Ni are imported as numerical. “Hard”, “Lcode” and “subset” can be considered factorial variables and therefore are directly converted accordinlgly for further use.
#import
dt <- read_csv("Grafe.csv") %>% as.data.table()
#convert colums to factor
fkt <- c("Hard", "Lcode", "subset" )
dt[,(fkt) := lapply(.SD, factor), .SDcols = fkt]
#print data table
dt %>% head() %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| X1 | X | Y | Al | Co | Fe | Filler | Mg | Ni | Hard | Lcode | subset |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 23 | 1325 | 225 | 4.70 | 0.001 | 41.50 | 53.685 | 0.09 | 0.024 | 3 | FZ | training |
| 32 | 1325 | 325 | 1.43 | 0.011 | 11.93 | 78.699 | 7.70 | 0.230 | 0 | FZ | training |
| 52 | 1300 | 250 | 3.60 | 0.004 | 19.70 | 76.259 | 0.39 | 0.047 | 3 | FZ | training |
| 76 | 1275 | 175 | 5.10 | 0.031 | 30.90 | 62.909 | 0.64 | 0.420 | 3 | FZ | training |
| 100 | 1275 | 225 | 1.60 | 0.012 | 3.50 | 81.479 | 13.10 | 0.309 | 2 | FZ | training |
| 123 | 1275 | 275 | 2.80 | 0.029 | 17.20 | 71.044 | 8.24 | 0.687 | 1 | FZ | training |
Generally, three groups of data are present in the dataset:
The the summary statistics for the numeric variables are shown in the following table.
options(knitr.kable.NA = '', digits=2, scipen=999)
sumstats <- profiling_num(dt) %>% #convenient function to calculate most standard summary stats
select(mean, std_dev, variation_coef, p_25, p_50, p_75, skewness, kurtosis) %>% #select wanted stats
cbind (.,data.frame(min = sapply(dt[,.SD, .SDcols = is.numeric], function(x) min(x, #add min / max
na.rm = T)),
max = sapply(dt[,.SD, .SDcols = is.numeric], function(x) max(x,
na.rm = T)) ))
names(sumstats) <- c("Mean", "Std. Dev.","Var. Coef.", "Q1", "Q2/Median", "Q3", "Skewness", "Kurtosis", "min", "max")
#improve look and print
sumstats %>%
kable( table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Mean | Std. Dev. | Var. Coef. | Q1 | Q2/Median | Q3 | Skewness | Kurtosis | min | max | |
|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 9292.05 | 5313.11 | 0.57 | 4303.50 | 8693.00 | 14329.50 | -0.31 | 1.4 | 23.00 | 14513.0 |
| X | 589.97 | 346.76 | 0.59 | 300.00 | 550.00 | 850.00 | 0.32 | 2.0 | 0.00 | 1325.0 |
| Y | 355.07 | 174.26 | 0.49 | 225.00 | 325.00 | 475.00 | 0.54 | 2.5 | 25.00 | 850.0 |
| Al | 4.84 | 3.57 | 0.74 | 1.50 | 4.20 | 7.60 | 0.55 | 2.4 | 0.10 | 17.5 |
| Co | 0.07 | 0.12 | 1.85 | 0.02 | 0.03 | 0.06 | 5.35 | 43.4 | 0.00 | 1.4 |
| Fe | 23.21 | 13.04 | 0.56 | 10.60 | 22.30 | 34.80 | 0.20 | 1.7 | 1.00 | 54.9 |
| Filler | 65.78 | 10.63 | 0.16 | 56.34 | 67.64 | 74.01 | -0.10 | 2.1 | 40.33 | 93.8 |
| Mg | 5.40 | 6.31 | 1.17 | 0.24 | 1.41 | 11.14 | 0.85 | 2.3 | 0.06 | 24.1 |
| Ni | 0.70 | 0.51 | 0.72 | 0.33 | 0.58 | 0.97 | 1.24 | 4.7 | 0.01 | 3.0 |
It stands out, that Co exhibits both a very high skewness as well as kurtosis. This points towards a highly skewed distribution with possible outlieres. Fe and Filler exhibit a relatively low skewness. Generally, the distributions of the variables will be investigated further. X1, the drillhole ID will be dropped from now on.
Regarding the units of the variables in the dataset, it can be assumed that the values of the chemical elements are weight-percent and the spatial coordinates can assumed to be meters. The exact coordinate system is not known, but it appears to be a local kartesian coordinate system. It is not known, wheather X denotes Northing or Easting. For the sake of simplification, it is chosen to be the same as the traditional X-axis in plots.
Furthermore, the chemical variables and the filler sum up to 100 %. This allows the chemical components to be treated as a set of compositions.
While histograms are a common and familiar technique to evaluate distributions of data, they are harder to read when data of multiple groups of data expressed in one plot. Because of this, density plots are chosen for visualisation in this work. This allows for a grouping of the data according to their lithology.
#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
id.vars = c("Lcode"))
#draw boxplots
a <- ggplot(data = dt.m, aes( x=value, fill=Lcode)) +
geom_density( alpha = 0.3, aes(y = ..density..))+
facet_grid(scales = "free", rows = vars(variable))
a
It shows, that for all variables except Fe and Filler, logaritmic scales seem to be preferentiable. This goes in line with the findings from the summary statistics. Hence, in the following, the logarithmized results are shown:
#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
id.vars = c("Lcode"))
#draw plots
a <- ggplot(data = dt.m, aes( x=log(value), fill=Lcode)) +
geom_density( alpha = 0.3, aes(y = ..density..))+
facet_grid(scales = "free", shrink=F, rows = vars(variable))
a
Both from the general histograms as well as from the log-histograms, it can be seen that the distributions show differences inbetween the different lithologies. For almost all elements, differences in the empirical distribution density funtions can be observed. both the spread of the distribution as well as the measures of centrality differ. The observed differences discussed further in the subsequent sections.
Also it can be seen, that a logarithmic scale seems to provide distributions that less are skewed and more close to a normal distribution which is a prerquisite for the kriging analysis.
To confirm this, the QQ-plots are drawn on the normal data:
{par(mfcol=c(2,3))
qqnorm((dt$Al), col=2);qqline((dt$Al));legend(legend = "Al","topleft")
qqnorm((dt$Co), col=2);qqline((dt$Co));legend(legend = "Co","topleft")
qqnorm((dt$Fe), col=2);qqline((dt$Fe));legend(legend = "Fe","topleft")
qqnorm((dt$Filler), col=2);qqline((dt$Filler));legend(legend = "Filler","topleft")
qqnorm((dt$Mg), col=2);qqline((dt$Mg));legend(legend = "Mg","topleft")
qqnorm((dt$Ni), col=2);qqline((dt$Ni));legend(legend = "Ni","topleft")}
Large deviations can be seen for most distributions. Especially the tail behaviour differs from the normal distributions. So the data again are log-transformed:
dt.orig <- dt #backup for debugging
fkt <- c("Al", "Co","Fe", "Filler", "Mg", "Ni") #colums to transform
fkt_new <- c("Al_log", "Co_log","Fe_log", "Filler_log", "Mg_log", "Ni_log") #names of the transformed colums
dt <- dt[,(fkt_new) := lapply(.SD, log), .SDcols = fkt] #actual transformation with data.table
Now the QQ-plots for the log-transformed data are as follows:
{par(mfcol=c(2,3))
qqnorm((dt$Al_log), col=2);qqline((dt$Al_log));legend(legend = "Al_log","topleft")
qqnorm((dt$Co_log), col=2);qqline((dt$Co_log));legend(legend = "Co_log","topleft")
qqnorm((dt$Fe_log), col=2);qqline((dt$Fe_log));legend(legend = "Fe_log","topleft")
qqnorm((dt$Filler_log), col=2);qqline((dt$Filler_log));legend(legend = "Filler_log","topleft")
qqnorm((dt$Mg_log), col=2);qqline((dt$Mg_log));legend(legend = "Mg_log","topleft")
qqnorm((dt$Ni_log), col=2);qqline((dt$Ni_log));legend(legend = "Ni_log","topleft")}
It can be seen that for:
As a result, all values, even Fe and Filler will be used as a logarithmized version. By this, all variables can be compared witch each other.
The summary for the cathegorial variables is shown below. Of relevance to the understanding of this dataset are the two variables Hard and Lcode. Additionally, a variable subset exists. It stores the information wich part of the data set is used for training and wich is used for validation of a modelling process. It is almost distributed 50-50 across the dataset. The table below shows the number of members in the respective groups.
dt[,.SD, .SDcols = is.factor] %>% summary %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| Hard | Lcode | subset | |
|---|---|---|---|
| 0: 53 | FZ:334 | training :371 | |
| 1:394 | SA: 65 | validation:364 | |
| 2:166 | SM:278 | ||
| 3:122 | UM: 58 |
The interrelationship of Hardness and Lithology is plotted with a mosaic plot. Hereby all possible combinations of levels for Lcode and Hard are plotted against each other. The size of the cells in each of the two axis’ corresponds to the number of members for that level. Additionally, the pearson residuals are plotted as blue and red shades. A positive pearson residual corresponds to more members being in this level combination than the null hypothesis would suggest. The null hypothesis is that no particular association exists between the groups. More information can be found in the supplement to the lecture [3], or in [4]. Additional to the mocaic plot the numeric values of the group combinations are shown in the contingency table.
#draw mosaic plot
vcd::mosaic(~Hard+Lcode, data = dt, shade=TRUE, legend=TRUE,direction="v",)
#calculate contingency table
ct <- dt[,.SD, .SDcols = c("Hard", "Lcode")] %>%
table() %>% as.data.frame.array()
#add colsums
csum <- ct %>% apply(.,2,sum)
ct["Sum",] <- csum
#add rowsums
rsum <- ct %>% apply(.,1,sum)
ct$Sum <- c(rsum)
#print
ct %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
| FZ | SA | SM | UM | Sum | |
|---|---|---|---|---|---|
| 0 | 19 | 10 | 21 | 3 | 53 |
| 1 | 169 | 43 | 181 | 1 | 394 |
| 2 | 81 | 6 | 63 | 16 | 166 |
| 3 | 65 | 6 | 13 | 38 | 122 |
| Sum | 334 | 65 | 278 | 58 | 735 |
It shows that Rock Hardness 3 is positively associated to UM (fresh ultramafic Zone) and negatively to SM (smectitic zone). Other than that, lower associations seem to exist beween SA (saprolitic zone) and Rock 0 as well as between SM and Rock 1. Especially the association of hardness level three is logical because fresh unweathered rock is generelly harder/more robust than its weathered counterparts. In this dataset, the hardness level could also identify or incorporate indicators for the Rock Mass Quality as details of the nature of the variable Hard are not known.
To identify the association of the elements to the lithologies, boxplots after Tukey are produced:
#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")],
id.vars = c("Lcode"))
ggplot(data = dt.m, aes( y=log(value),group=Lcode, fill=Lcode)) +
geom_boxplot( alpha = 0.3, notch = T)+
facet_wrap(scales = "free_y", shrink=F, ~variable, ncol=3)+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
It clearly can be seen that differences of the element components between the rock masses appear. For Al and Fe, the highest values are found in Zone FZ, followed by SA, SM and UM in descreasing order. Mg and Filler however behave oppositely, with the lowest values found in FZ and SA, SM and UM showing values in increasing oder. Co and Ni show a different behaviour. Here the highest values are found in SA, followed by SM. The association of SA to Ni and Co is already known from the geological description of the site and can be confirmed here. FZ and UM show similar low values of the two elements. Hereby the notches give a graphical indication whether the differences in the medians are significant. When the notches do not overlap, there is strong evidence, that the medians differ. According to that, the differences between:
do not differ significantly with a confidence intervall of 95 %.
The results however show, that generelly a relative strong influence of the lithology towards the mineralization can be assumed.
To identify whether similar findings can also be seen for the rock hardness, the procedure is repeated for the rock hardness.
#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Hard")],
id.vars = c("Hard"))
ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
geom_boxplot( alpha = 0.3, notch = T)+
facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)
dt.m <- melt.data.table(dt[Lcode=="SA",.SD, .SDcols = c(colnames.num, "Hard")], # Example for detailed view for Lcode="SA"
id.vars = c("Hard"))
ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
geom_boxplot( alpha = 0.3, notch = F)+
facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)+
labs(title = "Composition parts vs. Hardness for Lcode SA")
While some differences can be seen, the general picture is less destinct than for the rock type. UM is tendentially the hardest zone and also contains tendentially more Mg. This does not show in the hardness.
For Co, the picture on first sight is very similar to the results from the boxplots grouped by Lcode. Zone SA (wich is relatively higher associated to hardness 0), showed the highest Co content. Here however, hardness 1 shows the highest Co contents. So a simple relationship can not be assumed. Also crossinterelations between Lithology and hardness could be assumed.
For example within Zone SA, a even stronger accumulation of Co could be associated to a certain rock hardness. This is exemplarily shown in the second boxplot. There the notches are removed because the results are partially so skewed that the notches would extend over the IQR. Generally a similar picture for the elements Co, Mg and Ni can be observed: That a stronger accumulation of these elements occurs in hardness 1. The composition of a rock mass can influence the hardness even in a small scale. Some inicators for this during mechanical excavation have been found by Rimos et. al [3] during a Master Thesis at TU BAF. There, the cutting resistance for the cutting with conical picks of pyritic Pb-Zn-Ore samples from Reiche Zeche was tested. It showed a higher cutting resistance in areas in which increased Ca-contents where found. The element contents where aproximated in a dense pattern by means of a handheld XRFA device.
However, due to the necessary complexity in modelbuilding, this influence is dropped as it would overextend the framework of this work.
# this is completed if necessary
# options(scipen = 5)
# i <- 1
# for (i in 1:length(levels(dt$Lcode))) {}
# contrasts(dt$Lcode) <- contr.treatment(levels(dt$Lcode), base=i)
# u <- levels(dt$Lcode)[[i]]
# lm1 <- lm(dt,formula = Mg~Lcode)
# lm1 <- lm1%>%summary
#
# names(lm1[["coefficients"]][,1])
# lm1[["coefficients"]][,1]
# lm1[["coefficients"]][,4]
The correlation and covariance statistics for the numeric values are presented as follows. The correlation is calculated twofold: as spearman correlation and as pearson correlation. The pearson correlation is the goodness of a fit of a linear regression between these two variable. A pearson coeficient of 1 equals a R^2 of 1 for a linear regression. A spearman correlation coefitienf follows the same basic equation but takes into consideration the ranks of the data pairs insead of their actual values. As such, a spearman correlation of 1 means that the realation between the pair is perfect monotonous ascending function. As such, a high spearman and a low pearson coeficient indicates e.g. a quadratic relation. A more detailed description of the two methods can be found e.g. in [2].
Values in the upper triangle show the respectrive spearman-correlation coefficients, values in the lower part show the pearson correlation coefficients. Correlations that do not satisfy a treshhold of sigma < 0.05 are marked with a black cross. These correlations can be considered insignificant.
All calculations are done with the logarithmized values as these will be taken for the further interpolation - exept the spatial variables X and Y. For comparison, also the covariance matrix is computed.
use <- cbind(dt[,2:3], dt[,4:9] %>% log())
dt.ptest.sp <- cor.mtest(use,method="spearman")
dt.cor.sp <- cor(use,method = "spearman")
dt.ptest.p <- cor.mtest(use,method="pearson")
dt.cor.p <- cor(use,method = "pearson")
corrplot(dt.cor.sp,p.mat = dt.ptest.sp$p,
method = "color", outline=T,
type="upper",
addgrid.col = "grey", tl.col = "black",
tl.pos= "td",
addCoef.col="black",
sig.level = 0.05,
order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
mar =c(1,0,2,1))
corrplot(dt.cor.p,p.mat = dt.ptest.p$p,
method = "color", outline=T,
type="lower",
addgrid.col = "grey", tl.col = "black",
tl.pos= "lt",
addCoef.col="black",
cl.pos="n",
sig.level = 0.05,
order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
mar =c(1,0,1,1), add=T)
mtext(side = 3,line = 2, "Top: Spearman")
mtext(side = 2, "Bottom: Pearson")
Generally, it can be seen, that the correlations after pearson and spearman show similar results. Spearman correlation in some cases gives higher values than pearson. This is due to its definition. However, also the opposite can be observed here. When the pearson correlation is higher than spearman, it speaks for deviations from linearity that are so little, that they reduce the spearman correlation more than pearson (since spearman operates rank-based). When the spearman correlation is higher, a monotonuous but not so linear function is present.
The variable X and Y show very low correlations to the geochemical variables, some of them are also insignificant. This speaks for the absence of a general spatial trend. This will be investigated later in more detail.
Some correlation pairs show a higher correlation after spearman. This applies mainly to Mg/Ni and Al/Ni, Filler/Fe. Others like Mg/Filler and Fe/Al show a higher correlation for Pearson.
Generally Within the group of chemical variables, the pairs show the following noteworthy correlations:
Noteworthy is that Al correlates positively only with Fe and negatively with filler speaks for the fact that it occurs as a companion with the iron and not as a own silicate mineralization. This goes in line with the geological description after [1] wich states a lateritic zone. Co and Ni show a positive relation, at the same time Ni shows a negative relation to Al. This speacks for the fact that these two form their own mineralization complex.
To allow for a comparison of the variance values, the covariance is calculated for the log-transformed values. Otherwise, the scales would be on different scales. Through the quadratic nature of the variance, this would lead to uncomparable values inbetween the groups.
use <- dt[,4:9] %>% log()
dt.cov <- cov(use,method = "pearson")
corrplot(dt.cov,
method = "color", outline=T,
type="upper",
addgrid.col = "grey", tl.col = "black",
tl.pos= "td",
addCoef.col="black",
sig.level = 0.05,
order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
mar =c(1,0,2,1),is.corr = F,
number.digits=2, number.cex = 0.8,
cl.align.text = "l")
The table shows similar results to the correlation analyis. However, it also shows the magnitude of variation, since it is a non-standardized version of the correlation. As such, Fe and Filler generally vary not so much, although they show the highest values. The highest absolute covariation values show for the pairs Al-Mg and Fe-Mg. Since the values are negative, the correlation is also negative. Also noteworthy is that the Pair Fe/Filler - wich shows a strong correlation - but shows a weak covariance. This can be interpreted to the low standard deviation especially for Filler.
A Scatter plot matrix can shed more light on the cross relations between the covariates. It is a tool that can give a broad understanding of the situation and allows a more detailed picture than a sheer correlation analysis. Here, the data are grouped according to the rock type to identify whether the correlation behaviour is different inbetween the rock types. In the upper part, linear regression lines are drawn. On the diagonal, the density plots already shown previously are drawn. In the lower section raw scatterplots are drawn.
While the scatter plot matrix unveils the behaviour of the data accoring to their LCode, also the global Swath-plots are drawn.
library(GGally)
bigplot <- ggpairs(data = dt,columns = c(2,3,13:18),
mapping = ggplot2::aes(colour=dt$Lcode),
lower = list(continuous = wrap("points", alpha = 0.8, size=0.3)),
diag = list(discrete="barDiag",
continuous = wrap("densityDiag", alpha=0.5 )),
upper = list(continuous = wrap("smooth_loess", alpha = 0.1, size=0.05)
)
) +
theme(panel.grid.major = element_blank())
(bigplot)
The following points can be summarized from the Scatter Plot Matrix:
The clear correlation between Fe and Filler is repeated
The scatter splots indicate clusered ctructures according to the rock type. There the behaviour can be ralatively different for the different rock types. Comparing the regression lines to the correlation coefficients and the scatter plots, it shows that for most pars, the regression coefficients are relatively weak and the data appear more in clusters then in trends.
X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$X)
For Mg_log a outlying area varying area can be seen for low X-values. ALso slight trends could be observed for Fe, Filler and Ni to an extend.
X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$Y,xlab = "Y", )
In Y-direction strong local trends appear at low Y-Values.
These findings somewhat speack for the incorporation of directional trends, especially in Y-direction.
Following, the input variables with repsect to their spatial distribution are described.
The following maps plot the Hard and Lcode variables as dotplots. The Y axis represents the coordinte Y and X represents X. Only the training data are plotted.
dt.m <- melt.data.table(dt[subset=="training",.SD, .SDcols = c("Lcode", "Hard", "X", "Y")][,-c("X1" )],
id.vars = c("X", "Y"))
temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
geom_point(shape=19)+
coord_fixed()+
ggtitle(label = i) #+
#scale_color_gradient(low = "blue2",high = "red2")
}
do.call("grid.arrange", c(temp, ncol=2))
# par(mfrow=c(1,2))
#
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Hard), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Hard), fill=2:6)
#
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Lcode), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Lcode), fill=2:6)
It can be seen, that Hard this variable is relatively even distributed. However, Hard 2 and 3 seem to accumulate towards the east of the field. For Lcode, an accumulation of Fz can be seen in a central E-W-band. SM is accumulated more on the west respective N and S borders of the project. Of importance are also the variables UM and and SA. They are relatively sparsely dirstibutet. UM can be found in thin bands along the ESE-WNW axis. SA can be found in single patches in areas where FZ and SM seem to show contact zones. Since it already was shown, that the element contents are affected by the Lcode, it speaks for the fact, that trends need to be investigated.
To obtain an understanding of the distribution of the different elements. The values are plotted as follows:
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(fkt_new, "X", "Y")][,-c("X1" )],
id.vars = c("X", "Y"))
dt.m
## X Y variable value
## 1: 1325 225 Al_log 1.55
## 2: 1325 325 Al_log 0.36
## 3: 1300 250 Al_log 1.28
## 4: 1275 175 Al_log 1.63
## 5: 1275 225 Al_log 0.47
## ---
## 4406: 50 675 Ni_log -2.73
## 4407: 50 825 Ni_log -0.93
## 4408: 25 150 Ni_log -0.39
## 4409: 25 700 Ni_log -1.71
## 4410: 0 125 Ni_log -0.55
temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
geom_point()+
coord_fixed()+
ggtitle(label = i)+
scale_color_gradientn(colors = c("blue2","purple", "red2"))
}
do.call("grid.arrange", c(temp, ncol=3))
It can be seen from the maps that areas that show high concentrations of AL tend to show low concentrations of Co, Mg and Ni to some extend. Also the Filler variable exhibits this behaviour. Generally, it seems that two different groups of mineralization exist. One group with high Al and Fe concentrations and one zone with high Mg, Ni, Co, Filler concentrations. This also goes in line with earlier discoveries, where the Rtype corresponds with certain element levels.
For the subsequent geostatistical modelling procedures, the bin width and max variogram distance is estimated as follows. For this estimation, only the assay points for the “training” part of the data set are used.
The minimum distance between sample sites is ca. 35 m. This value will be used as bin size. The maximum can be estimated with the median distance between assay sites or graphical with the radius of the biggest circle that fits in the boundaries of the survey area. The median distance is 431.57 m. The radius of the biggest circle fitting in the study area is estimated with ca. 230 m. As a tradeoff beween the two values, a cutoff of 300 m is chosen since the area has a relatively elongated shape.
For reference, the subsequent plot shows the training and validation data locations.
a <- ggplot(dt, aes(x = X, y = Y, color = subset)) +
geom_point()+
coord_fixed()+
ggtitle(label = "Interactive Plot Training/Validation Data Set")
plotly::ggplotly(a)
For the interpolation of the catergorial variable, ordinary indicator kriging is used. No Cokriging is used in this case. The variables are modelled individually. A version of indicator cokriging was included in the mixed-model Cokriging (VK2). Here, the squential fitting of uncorelated anisotropic variograms was used. As such, no interaction between the different rock zones was assumed. This allowes a more exact fitting of single semivariogram models. Also it allows somewhat a comparison between the interpretation results of the Mixed Model and the sequential Indicator Kriging. Shown below is one of the fitted variograms.
#prepare grid for all further use
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)
# create the gstat containers
gs_i_FZ <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
gs_i_SA <- gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
gs_i_SM <- gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
gs_i_UM <- gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
Lcodes <- c("FZ", "SA", "SM", "UM")
#initiate the variograms as loop for each LCode
for (i in c("FZ", "SA", "SM", "UM")){
assign(paste0("vg_i_", i),
variogram(get(paste0("gs_i_", i)),
width=35, cutoff=300, alpha=c(0:5)*30 ))
}
#fit FZ
#plot(vg_i_FZ)
vgt_FZ <- vgm(psill=0.1, model="Wav", range=120, nugget=0,anis = c(40,0.4),
add.to = vgm(psill=0.15, range=350, nugget=0.1, model="Gau"))
# plot(vg_i_FZ, model=vgt_FZ)
# fit SA
#plot(vg_i_SA)
vgt_SA <- vgm(psill=0.05, model="Wav", range=50, nugget=0.03,anis = c(0,0.4))
#plot(vg_i_SA, model=vgt_SA)
# fit SM
# plot(vg_i_SM)
vgt_SM <- vgm(psill=0.05, model="Wav", range=80, nugget=0.03,anis = c(0,0.4),
add.to = vgm(psill=0.15, range=350, nugget=0.1, model="Exp"))
#plot(vg_i_SM, model=vgt_SM)
#fit UM
# plot(vg_i_UM)
vgt_UM <- vgm(psill=0.05, model="Exp", range=150, nugget=0.02,anis = c(120,0.3),
add.to = vgm(psill=0.01, range=50, nugget=0, model="Wav",anis = c(30,0.3)))
plot(vg_i_UM, model=vgt_UM)
The figure shows the fitted variogram model for UM. To fit it, an overlay of an exponential and a wave function were used at varying anisotropy azimut and ratios. The Exponential part has an azimut of 120° and the wave part an anisotropy of 30°. The figure also shows that a highly accurate fit could not be achieved. However the main characteristics of the variography structure could be captured. In summary the following variogram models were used for the fitting:
With the fitted variograms, the actual indicator kriging based modelling was conducted:
#update gstats
for (i in Lcodes){
assign(paste0("gs_i_", i),
gstat(id=i,
formula = (Lcode==paste0(i))~1,
locations = ~X+Y,
data=dt[subset=="training"],
model = get(paste0("vgt_",i)),
nmax=30)
)
}
#kriging
for (i in Lcodes){
assign(paste0("krig_", i),
predict(get(paste0("gs_i_", i)),
newdata=xy_grid)
)
}
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
krig_i_s <- cbind(krig_FZ, krig_SA[,3:4], krig_SM[,3:4], krig_UM[3:4])
krig_i_s$Lcode <- krig_i_s %>% select(contains(".pred")) %>% max.col %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
krig_i_s %<>% as.data.table()
# applying variance cutoff of Q0.90
var_cutoff <- krig_i_s$FZ.var %>% quantile(0.90, na.rm = T)
krig_i_s <- krig_i_s[FZ.var < var_cutoff]
#Plotting the results
a <- ggplot(data = krig_i_s, aes(x=X, y=Y, fill=krig_i_s$Lcode))+
geom_raster()+
coord_fixed()+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y, fill=Lcode), size=3, shape=21)+
labs(fill="Lcode", title = "Lcode")
a
The image shows the interpolated results as well as the original sampling points. It can be seen that for the SA areas, small “islands” appear. For UM, wich is also relatively scarce, also smaller patchy areas are modelled. At some sampling points, the modelled patches are so small that they visually do not extend the overlaid sampling point. The quality of the interpolation is quantified with a confusion matrix as follows.
#kriging for validation data
for (i in Lcodes){
assign(paste0("krig_val_", i),
predict(get(paste0("gs_i_", i)),
newdata=dt[subset=="validation",])
)
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
krig_val_i_s <- cbind(krig_val_FZ, krig_val_SA[,3:4], krig_val_SM[,3:4], krig_val_UM[3:4])
krig_val_i_s$Lcode <- krig_val_i_s %>% select(contains(".pred")) %>% max.col %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
krig_val_i_s %<>% as.data.table()
CM_i <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = krig_val_i_s$Lcode )
cvms::plot_confusion_matrix(CM_i$`Confusion Matrix`[[1]],
add_sums = TRUE,
add_normalized = F,
diag_percentages_only = T,add_row_percentages = F, sums_settings = )
It can be seen that the accuracy is relatively mediocre. The confusion matrix shows the number of counts in the respective fields, the sum of all predictions in the classes as well as the sum of the true classes (“Target”). The percentage values show the sensitivity for the respective class. Sensitivity is defined as the ratio of the number of true positive predictions devided by the sum of true positives and false negatives. Additionally the overall accuracy is calculated (not shown in the plot). It calculates as the ratio of correct predictions devided by the sum of cases For this model [5]. The overall accuracy is 58.8%.
In the context of accuracy, it must also mentioned, that the variable UM and SA appear relativly sparsely and their structures are small/thin. As such a certain error can be expected when comparing to the validation set.
In this capter, omnidirectional cokriging with the log-transformed values is conducted. Since this model is supposed to serve as a “simplistic” model. Neither anisotropy nor directional trends will be incorporated.
As a first approach to the modelling of the dataset, the onmnidirectional variograms are fitted as follows.
#variables to use
vars_l = c("Al_log", "Co_log", "Fe_log", "Filler_log", "Mg_log", "Ni_log")
#loop for allocating values to gstat object - only training data
#--> slight change to script, instead of prepopolating gs_l, it is set as empty var, by this the specifics of the variable reading have to be typed only once
{gs_l=NULL
for(j in vars_l){
frm = as.formula(paste(j,1, sep="~"))
print(frm)
gs_l = gstat(gs_l, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
}
}
## Al_log ~ 1
## Co_log ~ 1
## Fe_log ~ 1
## Filler_log ~ 1
## Mg_log ~ 1
## Ni_log ~ 1
vge_l = gstat::variogram(gs_l,width=35, cutoff= 300) # create variograms
#create initial varmodels
vgt_l = vgm(psill=0.5, range=120, nugget=1, model="Sph", add.to=vgm(psill=0.8, range=300, model="Gau"))
#fitting
gs_l = gstat::fit.lmc(v=vge_l, g = gs_l, model = vgt_l, correct.diagonal = 1.00001)
#plot varplots + fits
plot(vge_l, model = gs_l$model)
Most variarogram fits are considered sufficiently fitting. Some element combinations show problematic fitting. However their weight to the final result is considered to be very small. As such, the fits are considered sufficient. As such, the actual cokriging commences.
Predictions that have a variance of greater than the .95-quantile of the total variance of Fe are filtered out display and further purposes.
#making grid with a margin of 100 m around extends of survey area
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)
#prediction
cokriged = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
5% done
22% done
30% done
38% done
47% done
58% done
71% done
100% done
#cokriged %>% select(contains(".var")) %>% summary # for debugging
#sets the variance cutoff above wich values are omitted
var_cutoff <- cokriged$Fe_log.var %>% quantile(0.95, na.rm = T)
#omitting the respective values
cokriged <- cokriged %>% as.data.table()
cokriged <- cokriged[Fe_log.var < var_cutoff]
cokriged_val <- predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
0% done
100% done
#storing backtransformed kriging values
cokriged2 <- cokriged %>% select(contains(".pred")) %>% exp %>% cbind(cokriged[,c("X", "Y")], .)
Shown below are the interpolation results for the onmidirectional cokriging.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Al_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Al [%]")
a
Generally, few higher Al containing zones exist with contents of up to 12.5 %.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Co_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Co [%]")
a
Co is relatively scarce with only three smaller areas where is is concentrated above 0.1 %. Especially one area in the south-west stands out.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Fe_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Fe [%]")
a
Iron is relatively abundant in the deposit. With large areas containing above 35% of iron in the east, west as well as small strips in the north.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Filler_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Filler [%]")
a
While Filler is not a real chemical variable it is shown for reference purposes. It shows a opposite behaviour than the iron. This is logical since iron is the most abundand element and the covariates form a composite.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Mg_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Mg [%]")
a
Mg in large areas is very scarce with the main area below 10%. However, small strips in the central north as well in the south show concentrations of up to 30%.
a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Ni_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]")
a
Ni shows several areas where higher contents are measured. Especially in the West with up to 1.5% Ni. Also higher concentrations can be found in the centre and southeast.
In this chapter, the mixed model cokriging is shown. Here, all 6 continuous chemical variables including the filler variable and the Lcode are combined in one model. According to Journel and Rossi 1989 [6], kriging with a trend is mainly of importance when extrapolation is sought. Since this is not the case here, No trend is used.
#populate gstat with num-vars
{gs_cs=NULL
for(j in vars_l){
frm = as.formula(paste(j,"1", sep="~"))
gs_cs = gstat(gs_cs, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5)
}
}
#add cat. levels
gs_cs <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,g = gs_cs, nmin = 5) %>%
gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>%
gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>%
gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5)
The to incoporate anisotropy, the modelling procedure is as follows: 1st, the empirical variograms are evaluated. Then, using the varioApplet provided during the short course sessions, a preliminary fit with regards to anisotropy is obtained. Here, the anisotrpy values as follows where extracted:
The following variogram shows the empricial variogram with the fitted anisotropic model for Fe_log. The azimut was varied in 30° steps. The tolerance equals 15° (default).
# empirical variogram
vg_cs <- variogram(gs_cs,width=30, cutoff=300, alpha=c(0:5)*30)
anis <- c(156,0.52) #anisotropy values from applet
#fit model
vg_cs_m = vgm(psill=0.9, range=300, nugget=1,anis= anis, model="Sph")
gs_cs= gstat::fit.lmc(v=vg_cs, model=vg_cs_m, g=gs_cs, correct.diagonal = 1.0001)
#gs_cs$model <- model4
##draw selected plot with code from applet
# number of directions
ndirs = vg_cs$dir.hor %>% unique %>% length
# color scale
col_dirs = RColorBrewer::brewer.pal(ndirs,"Set1")
# plot
vg0 = vg_cs[vg_cs$id=="Fe_log",]
plot(gamma~dist, data=vg0, pch=21, cex=1.2,
bg=col_dirs[as.factor(dir.hor)],
xlim=range(0,vg0$dist), ylim=range(0, vg0$gamma))
vg0[, c("dist", "gamma", "dir.hor")] %>%
split(as.factor(vg_cs$dir.hor)) %>%
sapply(lines, col="grey")
## $`0`
## NULL
##
## $`30`
## NULL
##
## $`60`
## NULL
##
## $`90`
## NULL
##
## $`120`
## NULL
##
## $`150`
## NULL
myfun = function(x){
lines(x[,c(1,2)], col=col_dirs[x$dir.hor/30+1], lty=2)
}
vg0[,c("dist", "gamma", "dir.hor")] %>%
split(as.factor(vg0$dir.hor)) %>% sapply(myfun)
## $`0`
## NULL
##
## $`30`
## NULL
##
## $`60`
## NULL
##
## $`90`
## NULL
##
## $`120`
## NULL
##
## $`150`
## NULL
for(i in 1:6){
angle = pi/2 - ((i-1)*30)*pi/180
direction = c(sin(angle), cos(angle) ,0)
variodens = variogramLine(gs_cs$model$Fe_log, maxdist= 1.1*max(vg_cs$dist), dir=direction)
lines(variodens, col=col_dirs[i], lwd=2)
}
legend(x = "bottomright", legend = vg_cs$dir.hor %>% unique %>% paste(., "°") , fill=col_dirs )
It can be seen, that the fit is not perfect but the general features of the variography are taken.
cok_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6 ) %>% as.data.table()
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
1% done
13% done
20% done
23% done
27% done
29% done
32% done
36% done
39% done
42% done
45% done
49% done
53% done
58% done
63% done
70% done
79% done
100% done
#calculation of variance cutoff
var_cutoff <- cok_cs$Fe_log.var %>% quantile(0.90, na.rm = T)
#omitting the respective values
cok_cs <- cok_cs[Fe_log.var < var_cutoff]
#predictions for evaluation set
cok_cs_val <- predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
7% done
100% done
cok_cs_val$Lcode <- cok_cs_val %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
#calculating most propable Lcode from results
Lcodes <- c("FZ", "SA", "SM", "UM")
cok_cs$Lcode <- cok_cs %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>%
factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
#kriging results of backtransformed values
cok_cs2 <- cok_cs %>% select(paste0(fkt_new,".pred")) %>% exp %>% cbind(cok_cs[,c("X", "Y")], .)
a <- ggplot(data = cok_cs2, aes(x=X, y=Y, fill=cok_cs2$Ni_log.pred))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title = "Backtransformed Ni")
b <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Lcode))+
geom_raster()+
coord_fixed()+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y, fill=Lcode), size=3, shape=21)+
labs(fill="Lcode", title = "Lcode")
c <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Ni_log.var))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Var", title = "Variance Ni_log")
grid.arrange(a,b,c, nrow=2)
The second plot reveals a problematic situation with Lcode. Since the sampling sites for SA are relatively sparsly distributed, the variogram could not not capture any spacial variance changes even at the minimum bin. This can be seen in the variogram for SA. As a result even in cells that are very close to sample sites that sampled SA, SA is not the “most propable” Lcode. This results in rather odd results for the validation of LCode as shown in the following:
CM_m <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = cok_cs_val$Lcode )
cvms::plot_confusion_matrix(CM_m$`Confusion Matrix`[[1]],
add_sums = TRUE,
add_normalized = F,
diag_percentages_only = T,add_row_percentages = F, sums_settings = )
It can be seen that no validation data points where classified as UM and SA although 27 resp. 32 members exist in these classes. The sensitivity values lie at 65.7% for SM and 78.2% for FZ. Following from this, the overall accuracy lies at 60.2%. This means, that the accuracy for this model is marginally better - by 1.4%. This is remarcable given the total misclassification of the minor groups UM and SA and can be attributed to their relatively low contribution to the total sample amount.
With compositional modelling, the covariates are are treated as parts of a whole that sums up to one. Meaning, when one variable decreases, one or several others have to increase to maintain the condition of the whole one. As such, they form a multidimensional simplex. To illustrate this, the ternery diagram for the abundant variable Fe, Mg and Al, as well as for the elements Co, Ni and Fe is shown below. The diagram scales were centered. As such, the sparse variables can visually be compared to the abundand Fe.
dtcomp0 <- dt %>% select(c("Fe", "Mg", "Al")) %>% acomp
plot.acomp(dtcomp0,center = T)
dtcomp0 <- dt %>% select(c("Co", "Ni", "Fe")) %>% acomp
plot.acomp(dtcomp0,center = T)
It shows for example that very high values of MG correspond to lower to medium Al values. Also rising MG values up to a certain point go in line with a relatively stable ratio of Fe to Al.
Subsequently, swath plots for the compositions are used to identify whether the necessity for a spatial trend should be considered.
dtt <- dt[subset=="training",]
#dtt <- dt
dtcomp = dtt %>% select(Al:Ni) %>% acomp # calculate compositions
X = dtt %>% select(X, Y) #coordinates
swath(dtcomp, loc=X$X) # quite flat --> consant mean
In X-diection no larger trends are seen.
swath(dtcomp, loc=X$Y)
In Y-direction, local trends can be found. This is especially true for Mg-compositions at low Y-values, but for Co and Al to some extend. However, this does not apply for all variables. As such a constant mean is assumed.
# define the gmSpatialModel object
gmv=NULL
gmv = make.gmCompositionalGaussianSpatialModel(
data = dtcomp, # use
coords = X, # spatial coordinates
V = "alr", # eventually, use alr logratios if needed
formula = ~1+Y, # drift in Y-direction
nmax = 20,
maxdist = 300 # cokriging neighbourgood has max. 20 samples
)
# compute logratio variograms
lrvg = gmGeostats::logratioVariogram(gmv,maxdist=300, nbins=8)
# define a compositional linear model
lrmd = CompLinModCoReg(~nugget()+sph(200), comp=dtcomp)
# fit
lrmdf = gmGeostats::fit_lmc(v = lrvg, model=lrmd)
## iteration = 0
## Step:
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Parameter:
## [1] 1.00000e+00 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00
## [7] 0.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00
## [13] 2.77556e-17 0.00000e+00 1.00000e+00 2.00000e+02 1.00000e+00 0.00000e+00
## [19] 1.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00
## [25] 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00 2.77556e-17 0.00000e+00
## [31] 1.00000e+00
## Function Value
## [1] 10893
## Gradient:
## [1] -108.156693 297.619525 3049.300014 -368.288145 -748.182243
## [6] 2766.682668 -779.763677 -15.053831 -921.636873 -6057.064429
## [11] -1334.451479 -78.861751 -564.866807 1160.271098 2968.953504
## [16] -0.936274 -77.547524 216.663911 2492.878568 -331.235718
## [21] -602.925544 2234.538662 -647.137816 -9.206993 -751.703716
## [26] -4987.016242 -1109.433486 -55.840799 -460.259636 958.687200
## [31] 2427.462565
##
## iteration = 100
## Parameter:
## [1] 1.1721934 -0.2596609 0.4860233 -0.1436749 0.0773978 0.6690391
## [7] 0.6742597 0.1745889 1.6622803 0.3064601 0.3795978 0.1200620
## [13] -0.0366835 0.3468574 0.2425170 199.6695968 1.0909144 -0.1408501
## [19] 0.3306164 0.2670870 0.2278202 0.7385879 1.4286961 0.0324616
## [25] 1.3912912 0.9490167 0.4769990 0.1523720 -0.1642027 -0.0517475
## [31] -0.1553782
## Function Value
## [1] 49.3651
## Gradient:
## [1] -0.0353743 -0.2617160 -0.1434245 0.3056782 0.4594159 -0.4388945
## [7] -0.1237833 0.3831176 0.9725194 0.3064568 -0.3401379 0.6288761
## [13] -0.4948074 -0.4269850 -0.5709800 0.8095726 -0.3825595 -0.0790436
## [19] 0.1121595 -0.1432528 -0.0173355 0.1245203 0.4483971 0.1058077
## [25] -0.6995001 -0.3552509 0.1435383 -0.1836260 0.5693022 1.4424501
## [31] 0.1776168
##
## Iterationslimit überschritten. Algorithmus fehlgeschlagen.
# plot
plot(lrvg, lrvg=vgram2lrvgram(lrmdf))
Due to an unknown error, the anisotropy could not be fitted Hence a genereral onmidirectional model is utilized. The code for the fitting of an anisotropic model can be extended below.
# lrvga = gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=10,
# azimuth=c(0,45,90,115), azimuth.tol=60)
#
# colors <- c("blue", "red", "yellow", "purple")
#
# plot(lrvga, col= colors)
# class(lrvga)
#
#
# lrmda = LMCAnisCompo(Z=dtcomp,azimuths=c(0,45,90,115), models=c("nugget", "sph"))
#
# lrmdfa = gmGeostats::fit_lmc(v = lrvga,g=gmv, model=lrmda)
#
#
# plot(lrvga, lrvg=vgram2lrvgram(lrmdfa))
Shown below is a plot of the directional behaviour of the variograms.
#
gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=8,
azimuth=(0:11)*30, azimuth.tol=60) %>%
image
It can be seen that in E-W direction, a generally lower variation for most logratio-pairs is oberved.
Subsequently the kriging interpolations are shown with the backtransformed logratio models.
In the following, the kriging results for the backtransformed logratio compositions for three elements: Al, Co and Ni.
gmv = make.gmCompositionalGaussianSpatialModel(
data = dtcomp, # use vicaria-comp
coords = X, # spatial coordinates
V = "alr", # eventually, use alr logratios if needed
# formula = ~1+Y, # !!!creates bug at interpolation
formula = ~1, # consider drift in Easting direction
nmax = 30, # cokriging neighbourgood has max. 20 samples
maxdist = 400,
omax = 6,
model = as.LMCAnisCompo(lrmdf) # necessary conversion
)
#logratio cokriging
cokrig_alr = predict(gmv, newdata=xy_grid, debug.level=-1)
## starting cokriging
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
27% done
41% done
59% done
100% done
# cutoff of maximum variance
var_cutoff <- cokrig_alr$alr1.var %>% quantile(0.9, na.rm = T)
cokrig_alr <- cokrig_alr %>% as.data.table()
cokrig_alr <- cokrig_alr[alr1.var < var_cutoff]
#backtransforming to original variables
cokrig_compo = gsi.gstatCokriging2compo(
cokrig_alr, V="alr", orignames = colnames(dtcomp))
#transform into percent
cokrig_compo <- cokrig_compo %>% as.data.frame %>% multiply_by(100)
#model performance for comparison
cok_alr_val <- predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6)
## starting cokriging
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
##
100% done
cok_alr_val = cbind(cok_alr_val[,1:2],
gsi.gstatCokriging2compo(
cok_alr_val, V="alr", orignames = colnames(dtcomp)) %>%
as.data.frame %>% multiply_by(100)
)
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Al))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Al [%]")
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Co))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Co [%]")
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Ni))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]")
For the
n=10
# simulation for cokriging with log-values
set.seed(123)
cosim = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
1% done
5% done
9% done
12% done
16% done
19% done
22% done
26% done
29% done
32% done
36% done
39% done
42% done
46% done
49% done
52% done
55% done
59% done
62% done
65% done
68% done
71% done
74% done
77% done
80% done
83% done
86% done
89% done
92% done
95% done
98% done
100% done
# simulation for logratio compositions
cos_alr = predict(gmv, newdata=xy_grid, debug.level=-1,nmax=20, omax=6, nsim=n)
## starting cokriging
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
4% done
10% done
16% done
21% done
26% done
31% done
37% done
42% done
48% done
53% done
58% done
63% done
68% done
73% done
78% done
83% done
88% done
93% done
98% done
100% done
cos_alr1 <- cos_alr
cos_alr <- cos_alr1
# outcome analysis
class(cos_alr)
## [1] "data.frame"
dim(cos_alr)
## [1] 15759 52
colnames(cos_alr)
## [1] "X" "Y" "alr1.sim1" "alr1.sim2" "alr1.sim3"
## [6] "alr1.sim4" "alr1.sim5" "alr1.sim6" "alr1.sim7" "alr1.sim8"
## [11] "alr1.sim9" "alr1.sim10" "alr2.sim1" "alr2.sim2" "alr2.sim3"
## [16] "alr2.sim4" "alr2.sim5" "alr2.sim6" "alr2.sim7" "alr2.sim8"
## [21] "alr2.sim9" "alr2.sim10" "alr3.sim1" "alr3.sim2" "alr3.sim3"
## [26] "alr3.sim4" "alr3.sim5" "alr3.sim6" "alr3.sim7" "alr3.sim8"
## [31] "alr3.sim9" "alr3.sim10" "alr4.sim1" "alr4.sim2" "alr4.sim3"
## [36] "alr4.sim4" "alr4.sim5" "alr4.sim6" "alr4.sim7" "alr4.sim8"
## [41] "alr4.sim9" "alr4.sim10" "alr5.sim1" "alr5.sim2" "alr5.sim3"
## [46] "alr5.sim4" "alr5.sim5" "alr5.sim6" "alr5.sim7" "alr5.sim8"
## [51] "alr5.sim9" "alr5.sim10"
#debugged version to backtransform results in one container
cos_alr <- cos_alr1
cos_alr <- cos_alr[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp <- list()
for (i in 1:n){
cos_comp[[i]] <- cos_alr %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}
# mixed model cosimulation
cos_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
0% done
1% done
2% done
3% done
4% done
5% done
6% done
7% done
8% done
9% done
10% done
11% done
12% done
13% done
14% done
15% done
16% done
17% done
18% done
19% done
20% done
21% done
22% done
23% done
24% done
25% done
26% done
27% done
28% done
29% done
30% done
31% done
32% done
33% done
34% done
35% done
36% done
37% done
38% done
39% done
40% done
41% done
42% done
43% done
44% done
45% done
46% done
47% done
48% done
49% done
50% done
51% done
52% done
53% done
54% done
55% done
56% done
57% done
58% done
59% done
60% done
61% done
62% done
63% done
64% done
65% done
66% done
67% done
68% done
69% done
70% done
71% done
72% done
73% done
74% done
75% done
76% done
77% done
78% done
79% done
80% done
81% done
82% done
83% done
84% done
85% done
86% done
87% done
88% done
89% done
90% done
91% done
92% done
93% done
94% done
95% done
96% done
97% done
98% done
99% done
100% done
The
b <- ggplot(data = cosim, aes(x=X, y=Y, fill=cosim$Ni_log.sim1 %>% exp()))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="Cosimulation w. log-values")
a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=cos_cs$Ni_log.sim1%>% exp()))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="Mixed model cosimulation w. log-values")
c <- ggplot(data = cos_comp[[3]] %>% as.data.frame(), aes(x=cosim$X, y=cosim$Y, fill=Ni*100))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="Compositional logratio cosimulation")
grid.arrange(b, a,c, nrow=2)
# mean values of
#cosimulation with logvalues
for (i in fkt_new) {
cosim[,paste0(i,".mean")] <- cosim %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
cosim[,paste0(i,".vc")] <- cosim %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# mixed model
for (i in fkt_new) {
cos_cs[,paste0(i,".mean")] <- cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector()
cos_cs[,paste0(i,".vc")] <- cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# logratio cosimulation
cos_comp1 <- array(as.numeric(unlist(cos_comp)), dim=c(nrow(cos_comp[[1]]),
ncol(cos_comp[[1]]),
n))
cos_comp_sum <- data.table()
o <- 0
for (i in colnames.num) {
o <- o+1
cos_comp_sum[,paste0(i,".mean")] <- cos_comp1[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector() *100
cos_comp_sum[,paste0(i,".vc")] <- cos_comp1[,o,] %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()
}
cos_comp_sum <- cbind(cos_comp_sum, X=cosim$X, Y=cosim$Y)
The plots for the simulated mean values are again shown on the example of Ni
b <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.mean)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="Cosimulation w. log-values")
a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.mean)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="Mixed model cosimulation w. log-values")
c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.mean))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="Ni [%]", title ="Compositional logratio cosimulation")
grid.arrange(b,a,c, nrow=2)
While V1 and V2 show very similar behaviour, V3 shows little smaller and less “peaky” values.
b <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.vc)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="VC", title ="Cosimulation w. log-values")
a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.vc)))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="VC", title ="Mixed model cosimulation w. log-values")
c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.vc))+
geom_raster()+
coord_fixed()+
scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
#geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
labs(fill="VC", title ="Compositional logratio cosimulation")
grid.arrange(b,a,c, nrow=2)
It can be seen that large differences appear between
n=10
# simulation for cokriging with log-values V1
set.seed(123)
cosim_val = predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
92% done
100% done
# mixed model cosimulation
cos_cs_val = predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
37% done
76% done
100% done
# simulation for logratio compositions
cos_alr_val = predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1,nmax=20, omax=6, nsim=n)
## starting cokriging
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
##
48% done
100% done
# outcome analysis
class(cos_alr)
## [1] "data.frame"
dim(cos_alr)
## [1] 15759 50
colnames(cos_alr)
## [1] "alr1.sim1" "alr1.sim2" "alr1.sim3" "alr1.sim4" "alr1.sim5"
## [6] "alr1.sim6" "alr1.sim7" "alr1.sim8" "alr1.sim9" "alr1.sim10"
## [11] "alr2.sim1" "alr2.sim2" "alr2.sim3" "alr2.sim4" "alr2.sim5"
## [16] "alr2.sim6" "alr2.sim7" "alr2.sim8" "alr2.sim9" "alr2.sim10"
## [21] "alr3.sim1" "alr3.sim2" "alr3.sim3" "alr3.sim4" "alr3.sim5"
## [26] "alr3.sim6" "alr3.sim7" "alr3.sim8" "alr3.sim9" "alr3.sim10"
## [31] "alr4.sim1" "alr4.sim2" "alr4.sim3" "alr4.sim4" "alr4.sim5"
## [36] "alr4.sim6" "alr4.sim7" "alr4.sim8" "alr4.sim9" "alr4.sim10"
## [41] "alr5.sim1" "alr5.sim2" "alr5.sim3" "alr5.sim4" "alr5.sim5"
## [46] "alr5.sim6" "alr5.sim7" "alr5.sim8" "alr5.sim9" "alr5.sim10"
#debugged version to backtransform results in one container
cos_alr_val <- cos_alr_val[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp_val <- list()
for (i in 1:n){
cos_comp_val[[i]] <- cos_alr_val %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}
# mean values for validation
#cosimulation with logvalues
for (i in fkt_new) {
cosim_val[,paste0(i,".mean")] <- cosim_val %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
cosim_val[,paste0(i,".vc")] <- cosim_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# mixed model
for (i in fkt_new) {
cos_cs_val[,paste0(i,".mean")] <- cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector()
cos_cs_val[,paste0(i,".vc")] <- cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
# logratio cosimulation
cos_comp1_val <- array(as.numeric(unlist(cos_comp_val)), dim=c(nrow(cos_comp_val[[1]]),
ncol(cos_comp_val[[1]]),
n))
cos_comp_sum_val <- data.table()
o <- 0
for (i in colnames.num) {
o <- o+1
cos_comp_sum_val[,paste0(i,".mean")] <- cos_comp1_val[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector() *100
cos_comp_sum_val[,paste0(i,".vc")] <- cos_comp1_val[,o,] %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()
}
cos_comp_sum_val <- cbind(cos_comp_sum_val,
X=dt[subset=="validation", c("X")],
Y=dt[subset=="validation", c("Y")])
In order to evaluate the performance of the three different models, the validation part of the data set is used for cross validation.
The mean absolute error of the prediction in % is calculated.
#dataframe consisting of predicted and actual validation values
#for VK1 cokriging with log values
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")),
dt[subset=="validation"] %>% select(fkt_new))
sumry_l <- dt.kv_l %>% summarise(.,
Name = "VK1",
Al_me = mean(abs(exp(Al_log)-exp(Al_log.pred))),
Co_me = mean(abs(exp(Co_log)-exp(Co_log.pred))),
Fe_me = mean(abs(exp(Fe_log)-exp(Fe_log.pred))),
Filler_me = mean(abs(exp(Filler_log)-exp(Filler_log.pred))),
Mg_me = mean(abs(exp(Mg_log)-exp(Mg_log.pred))),
Ni_me = mean(abs(exp(Ni_log)-exp(Ni_log.pred))))
#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred")),
dt[subset=="validation"] %>% select(fkt_new))
sumry_cs <- dt.kv_cs %>% summarise(.,
Name = "VK2",
Al_me = mean(abs(exp(Al_log)-exp(Al_log.pred))),
Co_me = mean(abs(exp(Co_log)-exp(Co_log.pred))),
Fe_me = mean(abs(exp(Fe_log)-exp(Fe_log.pred))),
Filler_me = mean(abs(exp(Filler_log)-exp(Filler_log.pred))),
Mg_me = mean(abs(exp(Mg_log)-exp(Mg_log.pred))),
Ni_me = mean(abs(exp(Ni_log)-exp(Ni_log.pred))))
# for VK3 loratio cokriging
dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_alr <- dt.alr %>% summarise(.,
Name = "VK3",
Al_me = mean(abs((Al_log)-(Al))),
Co_me = mean(abs((Co_log)-(Co))),
Fe_me = mean(abs((Fe_log)-(Fe))),
Filler_me = mean(abs((Filler_log)-(Filler))),
Mg_me = mean(abs((Mg_log)-(Mg))),
Ni_me = mean(abs((Ni_log)-(Ni))))
#######for cosimulations######
#for VS1 cosimulation with log values
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
dt[subset=="validation"] %>% select(fkt_new))
sumry_cvl <- dt.cv_l %>% summarise(.,
Name = "VC1",
Al_me = mean(abs(exp(Al_log)-(Al_log.mean))),
Co_me = mean(abs(exp(Co_log)-(Co_log.mean))),
Fe_me = mean(abs(exp(Fe_log)-(Fe_log.mean))),
Filler_me = mean(abs(exp(Filler_log)-(Filler_log.mean))),
Mg_me = mean(abs(exp(Mg_log)-(Mg_log.mean))),
Ni_me = mean(abs(exp(Ni_log)-(Ni_log.mean))))
#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
dt[subset=="validation"] %>% select(fkt_new))
sumry_cv_cs <- dt.cv_cs %>% summarise(.,
Name = "VC2",
Al_me = mean(abs(exp(Al_log)-(Al_log.mean))),
Co_me = mean(abs(exp(Co_log)-(Co_log.mean))),
Fe_me = mean(abs(exp(Fe_log)-(Fe_log.mean))),
Filler_me = mean(abs(exp(Filler_log)-(Filler_log.mean))),
Mg_me = mean(abs(exp(Mg_log)-(Mg_log.mean))),
Ni_me = mean(abs(exp(Ni_log)-(Ni_log.mean))))
# for VS3 logratio cosimulation
dt.cos_alr <- data.table(cos_comp_sum_val ,
dt[subset=="validation"] %>% select(fkt_new) %>% exp)
sumry_cos_alr <- dt.cos_alr %>% summarise(.,
Name = "VC3",
Al_me = mean(abs((Al_log)-(Al.mean))),
Co_me = mean(abs((Co_log)-(Co.mean))),
Fe_me = mean(abs((Fe_log)-(Fe.mean))),
Filler_me = mean(abs((Filler_log)-(Filler.mean))),
Mg_me = mean(abs((Mg_log)-(Mg.mean))),
Ni_me = mean(abs((Ni_log)-(Ni.mean))))
####Summary####
sum_val <- rbind(sumry_l, sumry_cs, sumry_alr,
sumry_cvl,sumry_cv_cs,sumry_cos_alr) %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_val
| Name | Al_me | Co_me | Fe_me | Filler_me | Mg_me | Ni_me |
|---|---|---|---|---|---|---|
| VK1 | 2.202556 | 0.0447709 | 8.856579 | 7.082687 | 3.946235 | 0.3153578 |
| VK2 | 2.270978 | 0.0443517 | 9.107742 | 7.221644 | 4.011018 | 0.3164846 |
| VK3 | 2.202224 | 0.0445384 | 8.807351 | 8.020995 | 3.882599 | 0.3085122 |
| VC1 | 2.644393 | 0.0549201 | 9.684120 | 7.184744 | 4.175863 | 0.3193899 |
| VC2 | 2.479498 | 0.0544557 | 8.930254 | 7.071952 | 4.039620 | 0.3149048 |
| VC3 | 6.447633 | 0.0540879 | 9.678661 | 8.847417 | 3.863677 | 0.3230897 |
# temp <- list()
# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
# geom_point(size=0.5)+
# geom_abline(intercept = 0)+
# labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))
It shows that the variants generally show a similar error.
To obtain additional detailed information, the predicted values are plotted against the real values to also obtain a qualitative insight into the model quality.
i <- "Ni"
a <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VK1", x="Pred. V.", y= "Real V.")
b <- ggplot(data=dt.kv_cs, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VK2", x="Pred. V.", y= "Real V.")
c <- ggplot(data=dt.alr, aes(x = get(paste0(i)), y=get(paste0(i, "_log"))))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VK3", x="Pred. V.", y= "Real V.")
d <- ggplot(data=dt.cv_l, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log"))%>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VS1", x="Pred. V.", y= "Real V.")
e <- ggplot(data=dt.cv_cs, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log")) %>% exp()))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VS2", x="Pred. V.", y= "Real V.")
f <- ggplot(data=dt.cos_alr, aes(x = get(paste0(i, ".mean")), y= get(paste0(i, "_log"))))+
geom_point(size=0.5)+
geom_abline(intercept = 0)+
coord_fixed()+
labs(title = "VS3", x="Pred. V.", y= "Real V.")
grid.arrange(a,b,c,d,e,f, nrow=2)
It can be seen that …
VC2 is used for the final estimation as it showed the best overall results.
cos_cs_f_pred <- readRDS(file = "prediction_final.RDS")
# calculate mean values for chemicals
for (i in fkt_new) {
cos_cs_f_pred[,paste0(i,".mean")] <- cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector()
cos_cs_f_pred[,paste0(i,".vc")] <- cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}
#calculate sum LCode
for (i in Lcodes) {
cos_cs_f_pred[,(paste0(i,".sum"))] <-
cos_cs_f_pred %>%
select(contains(paste0(i, ".sim"))) %>% apply(.,1, median) %>% as.vector()
}
cos_cs_f_pred$Lcode.pred <- cos_cs_f_pred %>%
select(contains(".sum")) %>% max.col() %>% factor(x = .,levels = c(1,2,3,4),labels = Lcodes)
Following, Lcode and Ni with an overlay of Ni and the sample site with their Lcode is shown. In the second graph, zoom can be used, also the sampling points can be switched off by clicking on their symbols in the legend.
a <- ggplot(data = cos_cs_f_pred, aes(x=X, y=Y, fill=(cos_cs_f_pred$Lcode.pred %>% as.character())))+
geom_raster()+
coord_fixed()+
geom_point(data=dt, inherit.aes = F, aes(x=X, y=Y, fill=Lcode %>% as.character()), size=3, shape=21, alpha=0.5)+
labs(fill="Lcode", title = "Lcode")
a
plotly::plot_ly(title="Ni [%]") %>%
add_rasterly_heatmap(cos_cs_f_pred,
x= cos_cs_f_pred$X,
y= cos_cs_f_pred$Y,
z=cos_cs_f_pred$Ni_log.mean,
colorscale='Viridis') %>%
plotly::add_markers(data = dt, x=~X, y= ~Y, color= dt$Lcode,size=1,
marker=list(sizeref=10)) %>%
plotly::layout(title ="Ni [%] and Lcode sampling points",autosize = F )
It can be seen that the model fits very small patches to the sparse variables SA and UM. Also it tends to underfit the peak behaviour of certain points. This however is due to the fact that the mean values are plotted.
cos_cs_f_pred %<>% as.data.table()
dt.m <- melt.data.table(cos_cs_f_pred %>% select(contains(".sim")) %>%
select(contains(fkt_new))) %>% as.data.table()
dt.m$value <- dt.m$value %>% exp()
Mg_max <- dt.m[variable %like% ("Mg")]$value %>% max() %>% round()
Mg_q999 <- dt.m[variable %like% ("Mg")]$value %>% quantile(.,0.999) %>% round()
In the following, the comulative grade distribution for the individual elements is shown. The values are truncated at the 0.999-Quntile. This is because with the simulation method chosen in rare occasions very high values are given for a grid node. For example, the maximum for Mg lies at 215 %. This is clearly not possible but can be attributed to the fact, that a noncompositional method is used here. The 0.999-Quantile however then lies at 73 %. The vertical lines in the plots indicate the 0.999-Quantile.
i="Al_log"
temp <- list()
o=0
for (i in fkt_new) {
o=o+1
temp[[i]] <-
ggplot(dt.m[variable %like% (i)], aes(x = value, group = variable)) +
stat_ecdf(pad=F)+
geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% max())+
geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% quantile(.,0.999))+
labs(title=fkt[[o]], x = "content [%]")+
xlim(0,dt.m[variable %like% (i)]$value %>% quantile(.,0.999))
}
do.call("grid.arrange", c(temp, ncol=3))
To calculate the Grade-Cutoff-Plots, a custom function as seen in the code is used.
# function to calculate grade-cutoff-table
GraTon.Tab <- function(use, max.quantile=0.999, nsteps = 100) {
Q <- use %>% quantile(.,max.quantile) # define max use quantile?
steps <- seq(from=0, to=Q, length.out=nsteps) # number of steps
gt <- data.table(cutoff=steps) #basic table
gt$n <- lapply(steps, function(x)(length(which(use>x )))) %>% unlist() # Number of cells above cutoff
gt$prop <- gt$n/max(gt$n) #proportion of n
gt$mean <- lapply(steps, function(x)(mean(use[use>x] ))) %>% unlist() # mean above cutoff
gt
}
#create individual cutoff tables for all iterations and merge them into one table
temp <- list()
CO_all <- list()
coeff <- NULL
i="Al"
for (i in fkt) {
#apply to extract all iterations of given element and calculate single cutoff-table
CO_all[[i]] <- lapply(dt.m[variable %like% (i)]$variable %>% unique(),
function(x) (GraTon.Tab(use = dt.m[variable == (x)]$value,max.quantile = 0.99))) %>%
rbindlist(l = .,idcol = "Nr") #coerce individual tables into one
#coefficient for secondary axis transformation
coeff[paste(i)] <- max(CO_all[[i]]$mean)
CO_all[[i]]$mean <- CO_all[[i]]$mean/coeff[paste(i)]
#formula for secondary axis transformation
frm = as.formula(paste0("~.*",coeff[paste(i)]))
temp[[i]] <- ggplot(CO_all[[i]], aes(x=cutoff, group = Nr)) +
geom_line(aes( y= prop), color="blue3")+
geom_line(aes(y= mean), color = "Orange3")+
scale_y_continuous(name= "Proportion above cut-off",
sec.axis = sec_axis(trans=frm, name="mean grade [%]"))+
labs(title = i)
}
do.call("grid.arrange", c(temp, ncol=3))
In this report, three Methods have been applied to investigate their relative suitability to analyze a resource estimation problem in a part of a case study. The modelling was preceded by a explorative data analysis to decide on the appropriate methods and data transformations. The results have shown that the error in between the models is …. As such, the best performin model was …, however, the … .
The model shows a certain anisotropy in direction WNW-SSE. This anisotrpy was only modelled in the .. Approach.
[1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9
[2] - Makowski, D.; Ben-Shachar, M.; Patil, I.; Lüdecke, D. (2020): Methods and Algorithms for Correlation Analysis in R. In: JOSS 5 (51), S. 2306. DOI: 10.21105/joss.02306.
[3] - Tolosana-Delgado, R. and Menzel P., Block Course: Practical Geostatistics, TU Bergakademie Freiberg, 2021
[4] - Friendly, M.; Working with categorical data with R and the vcd and vcdExtra packages, York University, Toronto, 2021. https://cran.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial.pdf
[5] - Olsen, L.R.; The available metrics in cvms https://cran.r-project.org/web/packages/cvms/vignettes/available_metrics.html#multinomial-metrics
[6] - Journel, A.G., Rossi, M.E. When do we need a trend model in kriging?. Math Geol 21, 715–739 (1989). https://doi.org/10.1007/BF00893318